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We examine deterministic and nondeterministic state complexities of regular operations on prefix- 
free languages. We strengthen several results by providing witness languages over smaller alphabets, 
usually as small as possible. We next provide the tight bounds on state complexity of symmetric 
difference, and deterministic and nondeterministic state complexity of difference and cyclic shift of 
prefix-free languages. 

1 Introduction 

A language is prefix-free if for every string in the language, no proper prefix of the string is in the 
language. Deterministic and nondeterministic state complexity of basic operations on prefix-free regular 
languages have recently been studied by Han and Salomaa |]5j[6]]. The two papers follow current research 
that focuses on complexity in various sublasses of regular languages |Q~l|2l[3l|4]]. 

Here we continue this research and study the descriptional complexity of regular operations in the 
class of prefix-free regular languages. We strengthen several results on state complexity in |[5] HI by 
providing witness languages over smaller alphabets, usually as small as possible. We also correct some 
eiTors in these two papers, in particular, the binary automata used for the result on reversal do not provide 
the claimed lower bound. We next study the state complexity of difference, symmetric difference, and 
cyclic shift, and provide tight bounds. 

In the second part of the paper, we examine the nondeterministic state complexity of regular oper- 
ations. We introduce a new fooling-set lemma, which allows us to give a correct proof for union, and 
to get the tight bound for cyclic shift. The idea behind the lemma is to find a fooling-set for a regular 
language and then show that one more state is necessary by finding two appropriate strings. We prove 
tight bounds on the nondeterministic state complexity of all basic operations including difference and 
cyclic shift. 

2 State Complexity in Prefix-Free Languages 

We start with investigation of state complexity of regular operations on prefix-free languages. The lan- 
guages are represented by minimal dfa's, thus each of the dfa's has exactly one final state going to the 
dead state on every input symbol (6j. Then an operation is applied, and we are asking how many states, 
depending on the state complexities of operands, are sufficient and necessary in the worst case for a dfa 
to accept the language resulting from the operation. The next theorem provides the tight bounds for 
Boolean operations. In the case of union and intersection, the upper bounds are from [6 ], where witness 

'Research supported by VEGA grant 2/01 1 1/09 



I. McQuillan and G. Pighizzini (Eds.): 12th International Workshop 
on Descriptional Complexity of Formal Systems (DCFS 2010) 
EPTCS 31, 2010, pp. 1 9745041 doi: 10.4204/EPT CS.31.22I 



© G. Jiraskova & M. Krausova 



198 



Complexity in Prefix-Free Regular Languages 




Figure 1: The prefix-free dfa's meeting the bound mn — 2(m + n) + 6 for intersection. 

languages were defined over a three- and four-letter alphabet, respectively. We provide binary witnesses 
for both operations. Then we study symmetric difference and difference, and get the tight bounds in the 
binary and ternary case, respectively. 

Theorem 1 (Boolean Operations) Let m,n 3 and let K and L be prefix-free regular languages with 
sc(K) = m and sc(L) = n. Then 

1. sc(K(~)L) sj mn — 2(m + n) + 6, and the bound is tight in the binary case; 

2. sc(KL)L) sj mn — 2, and the bound is tight in the binary case; 

3. sc(K(BL) ^ mn — 2, and the bound is tight in the binary case; 

4. sc(K\L) ^ mn — m — 2n + A, and the bound is tight in the ternary case. 

Proof. Let the dfa's have states 0, 1, ... ,m — 1 and 0, 1, ... ,n — 1, of which m — 2 and n — 2 are final, and 
m—1 and n — 1 are dead. The initial state is 0. 

1. For tightness, consider binary prefix-free dfa's of Figure □ The strings a'^b'^a, a^b^aa, 
and a-ib' with ^ i ^ m — 3 and ^ j n — 3 are pairwise distinct in the right-invariant congruence 
defined by language K n L. 

2. Let K = (a*b) m - 2 and L = (b*a)"- 2 . The strings Va) with < K m- 3 and ^ j ^ n- 1, a j b m - 2 
and a-'b" 1 ^ 1 with ^ j n — 3, and a"~ 3 b m ~ 2 a and a n ~ 3 b m ~ 2 aa are pairwise distinct for KUL. 

3. In the cross-product automaton for symmetric difference, the rejecting state (m — 2, n — 2) is equiv- 
alent to the dead state, and states (m — 2,n — 1) and (m — 1, n — 2) accept only e. The same languages as 
for union meet the bound. 

4. All the states of the cross-product automaton in the last row and state (m — 2,n — 2) are dead, the 
other states in the last but one row only accept e. Pairs (i,n — 2) and (i,n — 1) are equivalent as well. 
This gives the upper bound, which is met by K= (b*(a + c)) m - 2 and L = ((a + c)*b) n - 3 c*(a + b). □ 

We now continue with concatenation and star, and slightly improve the results from [6] by providing 
unary witnesses for concatenation, and the complexity of star in the unary case. 

Theorem 2 (Concatenation and Star) Let m,n ^ 2 and let K and L be prefix-free regular languages 
with sc(K) = m. sc(L) = n. Then 

1. sc(KL) ^m + n — 2 and the bound is tight in the unary case; 

2. sc(L*) ^ n. The bound is tight in the binary case ifnj^3. 
The tight bound for star in the unary case is n — 2 if n^ 3. 

Proof. 1. We can get a dfa for the concatenation from the dfa's as follows [6]. We remove the dead state 
from the first dfa, and merge the final state in the first dfa with the initial state in the second dfa. All 
transitions going from a non-final state in the first dfa to the dead state will go to the dead state in the 
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second dfa. The resulting automaton is a dfa of m + n — 2 states for concatenation. The bound is met by 
unary prefix-free languages a m ~ 2 and ct n ~ 2 . 

2. We make the final state initial, and redirect transitions from the final state to such states, to which 
they go from the start state. The resulting dfa for star has at most n states. The upper bound is met by 
the binary prefix-free language (a n ~ 2 )*b |6). In the unary case, if n ^ 3, the only rc-state dfa prefix-free 
language is a"~ 2 . The star of this language, (a"~ 2 )*, is an (n — 2)-state dfa language. □ 

Before dealing with reversal, let us investigate nfa-to-dfa conversion. We recall the result from H] 
Theorem 19, which uses the proof of Theorem 6, which in turn uses Moore's proof in ifTUll l. We present 
different ternary witnesses, and give a simple proof. Then we show that the bound cannot be met in the 
binary case. 

Theorem 3 (NFA to DFA Conversion) Let n ^ 3 and let L be a prefix-free language with nsc(L) = n. 
Then sc(L) ^ 2"~ ! + 1. The bound is tight in the ternary case, but cannot be met in the binary case. 

Proof. Consider an rc-state nfa recognizing a non-empty prefix-free language. The corresponding min- 
imal dfa has exactly one final state, and so we can merge all final states in the subset automaton. This 
gives the upper bound 2 n ~ 1 + 1 . 

For tightness, consider the ternary nfa of Figure [2 In the corresponding subset automaton, each 
singleton set and the empty set are reachable. Each set {i\ , 12, . . . , ik} with ^ i\ < z'2 < • ■ • < 4 ^ n — 2 
of size k is reached from set { i% — h , h — h > • • • , 4 ~h} of size k — 1 by ba l[ . Since for each state i, the 
string a"~ 2 ~ l c is accepted by the nfa only from state i, no two different states of the subset automaton are 
equivalent. 

Now consider the binary case. In a minimal binary ra-state prefix-free nfa denote by n the final state, 
and by n — 1 a state that goes to n by a symbol a. In the corresponding subset automaton, there must be 
a state i in {1,2, ... ,n— 1} that goes to a non-empty subset S of {1,2, ... ,n — 1} by symbol a because 
otherwise the nfa on states {1,2, ... ,n — 1} would be unary, and so the number of reachable states in 
the corresponding subset automaton could not be 2 . Since all subsets of {1,2, ... ,72 — 1} must be 
reachable, the subset {i,n — 1} is reachable. However, subset {i,n — 1} goes to a superset of state SU {n} 
by a, which in turn goes by a non-empty string to an accepting state that is reached from the superset. 
This contradicts to prefix-freeness of the accepted language. □ 

In the case of reversal, the result in Q uses binary dfa's from |[TT1l . It is claimed in iPTTl Theorem 3] 
that the automata meet the upper bound 2" on the state complexity of reversal. However, this is not true. 
In the case of n = 8, with initial and final state 1, the number of reachable states in the subset automaton 
corresponding to the reverse of the dfa is 252 instead of 256: subsets {1,4,5,8}, {2,5,6, 1}, {3,6,7,2}, 
and {4,7,8,3} cannot be reached from any subset by b since each of them contains exactly one of states 1 
and 3; and by a, there is a cycle among these states. A similar reasoning shows that, whenever n = 8 +4k, 
the automata with the initial and final state 1 in ifTTI do not meet the bound 2". The binary automata with 
a single accepting state meeting the upper bound for reversal have recently been presented in lfl2l . We 
use them to get correct ternary prefix-free witnesses for reversal. 
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Figure 2: The prefix-free nfa meeting the bound 2" +1 for nfa-to-dfa conversion. 



200 



Complexity in Prefix-Free Regular Languages 



b b b 




Figure 3: The binary dfa requiring 2" 2 states for reversal. 



Theorem 4 (Reversal) Let n ^ 4 and let L be a prefix-free regular language with sc(L) = n. Then 
sc(L s ) ^ 2"~ 2 + 1. The bound is tight in the ternary case, but cannot be met in the binary case. 

Proof. We first construct an nfa for the reversal from the given dfa by removing the dead state, reversing 
all transitions, and switching the role of the initial and final state. Since no transition in the resulting nfa 
goes to the initial state, the corresponding subset automaton has at most 2 M ~ 2 + 1 states. 

For tightness, first consider the binary dfa of n — 2 states depicted in Figure [3] It has been show in 
021, that the reversal of the language recognized by this dfa requires 2"~ 2 states. Now change the dfa 
as follows. Add two more states n — 1 and n. State n — 1 will be the sole final state, while state n will 
be dead. Define transitions on a new symbol c: state 2 goes to the new final state n — 1 by c, and each 
other state goes to the dead state n. The resulting automaton is a prefix-free ternary rc-state dfa requiring 
2«-2 _j_ J deterministic states for reversal. 

Now consider the binary case. Let L be a binary prefix-free witness language. Then nsc(L R ) ^ n — 1 
because the minimal dfa for L has the dead state. Since sc(L R ) = 2"~ 2 + 1, language L R is a binary 
witness for nfa-to-dfa conversion. Theorem [3] shows that this cannot happen. □ 

The state complexity of cyclic shift was examined in @, where the upper and lower bound are 
only asymptotically tight. The next theorem provides the tight bound for this operation in the class of 
prefix-free regular languages. 

Theorem 5 (Cyclic Shift) Let Lbe a prefix-free language with sc(L) = n. Then sc(L cs ) ^ (2n — 3)"~ 2 . 
The bound is tight for a six-letter alphabet. 

Proof. Consider an n-state dfa for a prefix-free language L with states 1,2, ... ,n, of which 1 is the initial 
state, n — 1 is the sole final state that goes to the dead state n on each symbol. If a string w is in the 
language L cs , then w = uv for some strings u,v such that vu € L. That is, the initial state 1 goes to a state 
i by v, and then from state i to the accepting state n — 1 by u. Thus, a string uv is in L cs if and only if 
there is a state i such that i goes to the accepting state n — 1 by u, and the initial state 1 goes to state i 
by v. Because of prefix-freeness, state i is less then n — 1. Hence the cyclic shift is the union of n — 2 
concatenations L(Bj)L(Ci), i= 1,2, . . . ,n — 2, where B, = (Q,L,8,i,{n — 1}) and C, = (<2,£,5, 1, {/}) 
(cf. (3). Each such concatenation is recognized by a dfa of 2n — 3 states since we first remove a dead 
state from Bj, then merge the final state of B, and the initial state of C„ and finally merge states n — 1 and 
n in Cj since they are dead. Thus we have the union of n — 2 dfa's, each of which has 2n — 3 states, which 
gives the upper bound (2n — 3)"~ 2 . 

For tightness, set m = n — 2 and let £ = {a,b,c,d,g,h}. Define a prefix-free dfa over £ of n states 
1,2, . . . ,m,m+ l,m + 2, of which 1 is the initial state, m + 1 is the sole accepting state that goes to 
the dead state m + 2 by each symbol; and for states 1,2, ...,m, the transitions, except for symbol d, 
are defined as in Figure |4j Next, by d, state m goes to state m+l, and each other state to itself. The 
proof proceeds by showing the reachability and inequivalence of all m-tuples in the subset automaton 
corresponding to m(2m + 1) -state nfa for cyclic shift. □ 
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b,c b ' c a,b,c 




Figure 4: The transitions on a,b,c,g,h in the prefix-free witness dfa for cyclic shift. 

3 Nondeterministic State Complexity 

This section deals with the nondeterministic state complexity of regular operations on prefix-free lan- 
guages. This time, the languages are represented by nfa's. The nfa's have exactly one final state that 
goes to the empty set by each symbol. However, such an nfa is not guaranteed to accept a prefix-free 
language. On the other hand, if such an nfa is a partial dfa, then it accepts a prefix-free language since to 
get the prefix-free dfa for the language we only need to add a dead state. If accepted language consists 
of strings ending in a symbol that does not occur anywhere else in the string, then such a language is 
prefix-free as well. 

We are asking how many states, depending on the nondeterministic state complexity of operands, 
are sufficient and necessary in the worst case for an nfa with a single initial state to accept the language 
resulting from some operation. To prove the results we use a fooling set lower-bound technique. A set 
of pairs of strings {(jti,;yi), (x2,yi), ■ ■ ■ , {x n ,y n )} is called a fooling set for a language L if (1) for all i, 
the string xtyt is in the language L, and (2) if i ^ j, then at least one of strings xtyj and Xjyi is not in the 
language L. It is well-known that the size of a fooling set for a regular language provides a lower bound 
on the number of states in any nfa for this language. The next lemma shows that sometimes one more 
state is necessary. 

Lemma 1 (|8|) Let Lbe a regular language. Let srf and S3 be sets of pairs of strings and let u and v be 
two strings such that g/U3§, <e/ L) {(e,u)}, and 0§U {(e,v)} are fooling sets for L. Then every nfa for L 
has at least + + \ states. □ 

Theorem 6 (Boolean Operations) Let m,n^3. Let K and L be prefix-free languages with nsc(K) = m 
and nsc (L) = n. Then 

1. nsc(A'UL) ^ m + n, and the bound is tight in the binary case; 

2. r&c(K n L) ^ mn — {m + n) + 2, and the bound is tight in the binary case; 

3. nsc(L c ) ^ 2" _1 , and the bound is tight in the ternary case; 

4. nsc(K\L) ^ (m — \)2 n ~ l + 1, and the bound is tight for a four-letter alphabet. 

Proof. 1. Let A and B be m and rc-state prefix-free nfa's with initial states sa and sb, and transition 
functions 8a and 8b, respectively. To get an nfa for the union we add a new initial state going to 
8a(sa,o) u8b(sb,o) by each symbol a. Since both automata are prefix-free, we can merge their fi- 
nal states. Therefore, the upper bound is m + n. To prove tightness, consider prefix-free languages 
K = (a m ~ l )*b and L = (b n ~ l )*a accepted by an m-state and n-state nfa, respectively. Let srf and ,% be 
the following set of pairs of strings: 
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Figure 5: The prefix-free nfa's meeting the bound mn — (m + n) + 2 for intersection. 

,s/ = {(a m - l b,£)}U{(a\a m - l - i b)\i=l,2,...,m-2}U{(a m -\a m - l b)}, 

3& = {(bj,b n - l -j a ) | j = l,2,...,n-2}U{{b n - i ,b n - l a)}. 
Let us show that the set U SB is a fooling set for language KUL. The concatenation of the first and the 
second part in each pair results in a string in {a m ~ i b,a 2m ~ 2 b,b n ~ i a,b 2n ~ 2 a}. Each of these strings is in 
language KUL. If we concatenate the first and the second part in two distinct pairs, we get a string in 
a m ~ l b + a + (b + e) or in {a r b,a m ~ l+r b,b s a,b n ~ l+s a, | < r < m— 1,0 < s < n — 1} or a string in a + b + a. 
None of them is in KUL. Next, &f U {(E,b n ~ l a)} and SB U {(e,a m ^ l b)} are fooling sets for KUL. By 
Lemma [T] every nfa for the union has at least m + n states. Notice that the set of pairs in is not a 
fooling set. 

2. In the cross-product automaton for the intersection, no string is accepted from states (i,n— 1) and 
(m — l,j), except for the sole final state (m — l,n — 1). We can exclude all these states, and get an nfa of 
(m — l)(n — 1) + 1 states. For tightness, consider binary prefix-free nfa's of Figure [5] The languages are 
the same as in the deterministic case for intersection, but now they are accepted by nfa's, and so we do 
not need dead states. Consider the cross-product nfa for the intersection of the two languages, and let Q 
be the set of its (m — \ ){n — 1) + 1 states excluding all states (i,n — 1) and (m—l,j), but including state 
(m — l,n — 1). For each state q in Q, there exist strings x q and y q such that the initial state goes only to 
state q by x q , and the string y q is accepted by the cross-product only from state q. It follows that the set 
of pairs {{x q ,y q ) \ q G Q} is a fooling set for the intersection of the two languages since each string x q y q 
is accepted by the cross-product automaton, while x p y q is not if p ^ q. 

3. Let N be an rc-state nfa for a prefix-free language L. The equivalent minimal dfa D has exactly one 
final state /, from which all transitions go to the dead state d. It follows that dfa D has at most 2" _1 + 1 
states. After interchanging the accepting and rejecting states in dfa D, we get dfa D' for language L c with 
the same number of states as in D. In dfa D', all states are accepting, except for state /, and moreover, 
accepting state d goes to itself by each symbol since it was dead in D. The initial state s of dfa D' is 
accepting as well. Let us construct nfa A^' of 2' !_1 states for U from dfa D' as follows. First, add a 
transition by a symbol a from a state q to the initial (and accepting) state s, whenever there is a transition 
in dfa D' from q to d by a (in particular, add transition from d to s by each symbol). Next, make state 
d rejecting. Finally, redirect all transitions going to state / to state d, and remove state / with all its 
ingoing and outgoing transitions. The resulting language is the same, that is L c , and the nfa N 1 has 2" _1 
states. The prefix-free language Lc, where L is the binary [n — 1) -state nfa language reaching the bound 
2«-i f or complement Q, meets the bound since the set {(x, yc) \ (x,y) is in the fooling set for L fH} is a 
fooling set of size 2' ,_1 for language Lc. 

4. The upper bound for intersection of a prefix-free m-state nfa language and a regular «-state nfa 
language is (m — \ )n + 1, and the upper bound for KnL c , then follows from part 3. For tightness, first 
let L' be the ternary «-state nfa prefix-free language from part 3 meeting the bound 2"~ 1 for complement. 
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Let be the fooling set for (L') c described in part 3. In each state of the nfa for L', except for final state 
n, add a loop by d, and denote the resulting prefix-free language by L. Next, define an m-state nfa prefix- 
free language K by K = ({a + b)* d) m ~ 2 (a + b)* c : Consider the following set & = {(xd\ d m ~ 2 ~'y) \ 
(x,y) G = 0, 1, . .. ,m — 2}. For each pair in ", the string xd'd m ~ 2 ~'y is in K. The nfa for L, as well 
as the nfa for L' , goes to a subset of {1,2, ... ,n — 1} by x. In each state of this subset, there is a loop by 
d in the nfa for L, so the nfa is in the same subset after reading d"'~ 2 . Then it proceeds as the nfa for L' 
and rejects since xy is in (L') c . Thus xd'd m ~ 2 ~'y G L c . On the other hand, if i ^ j, then xd l d m ~ 2 ~iy £ K. 
Now assume that i = j, and that (x,y) and (w,v) are two distinct pairs in J?'. Then, without loss of 
generality, xv £ (L') c ', and so xv G L'. Thus there exists an accepting computation of the nfa for L' on 
string xv. It follows that there also exists an accepting computation of the nfa for L on xv since after 
reading x the nfa for L' is in a state in {1,2, ... ,n — 1}, in which there is a loop by d in the nfa for L. 
Therefore, xv G L, and so xv ^ If . Hence is a fooling set for language KnL c of size (m — 1)2 . 
Now, add one more pair (a n ~ 2 d m ~ 2 c, e). The resulting set is again a fooling set for n L c . □ 

Theorem 7 (Concatenation, Reversal, Star) Let K and L be prefix-free languages with nsc (K) = m 
and nsc (L) = n. Then 

1. nsc (AX) ^ m + n — 1, a«<i bound is tight in the unary case; 

2. nsc(L s ) ^ «, awe/ bound is tight in the unary case; 

3. nsc(L*) ?J n, and the bound is tight in the binary case. 

Proof. 1. Since both languages are prefix-free, to get an nfa for their concatenation, we merge the final 
state in the nfa for K with the initial state in the nfa for L. This gives the upper bound m + n — 1 . For 
tightness, consider unary prefix-free regular languages a'" -1 and a" -1 . Their concatenation is a m+n ~ 2 . 
Every singleton language a k ~ x is accepted by a &-state nfa, and the nfa is minimal since {(a' ,a k ~ l ~') \ 
i = 0, 1 , . . . , k — 1 } is a fooling set for such a language. 

2. To obtain an n-state nfa for the reversal, we reverse all transitions in the nfa for a prefix-free 
language L, and switch the role of the initial and the sole accepting state. The unary language a"~ l meets 
the bound. 

3. Since language L is prefix-free, we can construct an nfa for language L* from the nfa for L, 
with the initial state s, final state /, and transitions function 8 as follows. We make final state / initial, 
thus £ will be accepted. We add transitions by each symbol a from state / to 8 (s, a). The resulting 
n-state nfa recognizes L*. For tightness, consider binary prefix-free language L = (b"~ l )*a. Since the set 
{(e,e)} U {{b l ,b n ~ l ~' a) | i = 1,2, ... ,n — 1} is a fooling set for language L* of size n, every nfa for the 
star requires n states. □ 

Theorem 8 (Cyclic Shift) Let n ^ 3 and let L be a prefix-free regular language with nsc (L) = n. Then 
nsc(L") ^ In 2 —4n + 3. The bound is tight in the binary case. 

Proof. The construction in Theorem [5] gives an (n — l)(2n — 2)-state nfa for the cyclic shift with a set 
S of initial states. To get an nfa with a single initial state, we add a new initial state going to S by the 
empty string. For tightness, consider the binary language accepted by the nfa of Figure [6l The proof 
proceeds by describing a fooling set for the cyclic shift of this language of size (n — 1 ) (2ti — 2). Then we 
use Lemma[T]to prove that one more state is necessary. □ 




Figure 6: The prefix-free nfa meeting the bound 2n 2 — An + 3 on cyclic shift. 
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